FM-index for Dummies

نویسندگان

  • Szymon Grabowski
  • Marcin Raniszewski
  • Sebastian Deorowicz
چکیده

The FM-index is a celebrated compressed data structure for full-text pattern searching. After the first wave of interest in its theoretical developments, we can observe a surge of interest in practical FM-index variants in the last few years. These enhancements are often related to a bit-vector representation, augmented with an efficient rankhandling data structure. In this work, we propose a new, cache-friendly, implementation of the rank primitive and advocate for a very simple architecture of the FM-index, which trades compression ratio for speed. Experimental results show that our variants are 2–3 times faster than the fastest known ones, for the price of using typically 1.5–5 times more space.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reusing an FM-index

Intuitively, if two strings S1 and S2 are sufficiently similar and we already have an FM-index for S1 then, by storing a little extra information, we should be able to reuse parts of that index in an FM-index for S2. We formalize this intuition and show that it can lead to significant space savings in practice, as well as to some interesting theoretical problems.

متن کامل

Groups in which every subgroup has finite index in its Frattini closure

‎In 1970‎, ‎Menegazzo [Gruppi nei quali ogni sottogruppo e intersezione di sottogruppi massimali‎, ‎ Atti Accad‎. ‎Naz‎. ‎Lincei Rend‎. ‎Cl‎. ‎Sci‎. ‎Fis‎. ‎Mat‎. ‎Natur. 48 (1970)‎, ‎559--562.] gave a complete description of the structure of soluble $IM$-groups‎, ‎i.e.‎, ‎groups in which every subgroup can be obtained as intersection of maximal subgroups‎. ‎A group $G$ is said to have the $FM$...

متن کامل

A bloated FM-index reducing the number of cache misses during the search

The FM-index is a well-known compressed full-text index, based on the Burrows–Wheeler transform (BWT). During a pattern search, the BWT sequence is accessed at “random” locations, which is cache-unfriendly. In this paper, we are interested in speeding up the FMindex by working on q-grams rather than individual characters, at the cost of using more space. The first presented variant is related t...

متن کامل

FM-index of alignment with gaps

Recently, a compressed index for similar strings, called the FM-index of alignment (FMA), has been proposed with the functionalities of pattern search and random access. The FMA is quite efficient in space requirement and pattern search time, but it is applicable only for an alignment of similar strings without gaps. In this paper we propose the FM-index of alignment with gaps, a realistic inde...

متن کامل

The FM-Index: A Compressed Full-Text Index Based on the BWT

In this talk we address the issue of indexing compressed data both from the theoretical and the practical point of view. We start by introducing the FM-index data structure [2] that supports substring searches and occupies a space which is a function of the entropy of the indexed data. The key feature of the FM-index is that it encapsulates the indexed data (self-index) and achieves the space r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017